Search CORE

36 research outputs found

Recommended from our members

A Log Domain Pulse Model for Parametric Speech Synthesis

Author: Degottex Gilles
Gales Mark
Lanchantin Pierre
Publication venue: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Publication date: 01/01/2018
Field of study

Most of the degradation in current Statistical Parametric Speech Synthesis (SPSS) results from the form of the vocoder. One of the main causes of degradation is the reconstruction of the noise. In this article, a new signal model is proposed that leads to a simple synthesizer, without the need for ad-hoc tuning of model parameters. The model is not based on the traditional additive linear source-filter model, it adopts a combination of speech components that are additive in the log domain. Also, the same representation for voiced and unvoiced segments is used, rather than relying on binary voicing decisions. This avoids voicing error discontinuities that can occur in many current vocoders. A simple binary mask is used to denote the presence of noise in the time-frequency domain, which is less sensitive to classification errors. Four experiments have been carried out to evaluate this new model. The first experiment examines the noise reconstruction issue. Three listening tests have also been carried out that demonstrate the advantages of this model: comparison with the STRAIGHT vocoder; the direct prediction of the binary noise mask by using a mixed output configuration; and partial improvements of creakiness using a mask correction mechanism.European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie; 10.13039/501100000266-EPSR

Apollo (Cambridge)

The case of Digital Writing in Instant Messaging: When cyber written productions are closer to the oral code than the written code.

Author: Lanchantin Tonia
Largy Pierre
Simoës-Perlant Aurélie
Publication venue: PsychNology Journal
Publication date: 20/12/2012
Field of study

International audienceThe use of New Information and Communication Technologies, or NICTs, has deeply changed the traditional reading and writing practices. It thus seems necessary to provide a definition of Digital Writing in Instant Messaging (DWIM) to better understand its grammatical, lexical and syntactic characteristics (these two last components define the traditional characteristics of both oral and written codes). Thirty-two French-speaking students around the age of 13 who were enrolled in 8th grade produced one hour of DWIM productions on an instant messaging website in groups of two. They were able to use as many cyber languages as they wanted (we referred the expression digital writing). This corpus helped to understand that this written structure is closer to the oral code than the written code (the studied population developed their language skills in constant contact with the written in its dual form). Indeed, we showed for instance that users of DWIM sometimes produced repetitions (whereas it is forbidden in traditional writing), never use subject-verb inversions in interrogative sentences, can replace punctuation with emoticons, or used undefined deixises in their sentences. We have also been able to show that having traditional reading and writing habits is not sufficient to create a predisposition towards the use of the DWIM code

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

Segmentation d'images multispectrales par arbre de Markov caché flou

Author: LANCHANTIN Pierre
SALZENSTEIN Fabien
Publication venue: GRETSI, Groupe d’Etudes du Traitement du Signal et des Images
Publication date: 01/01/2005
Field of study

Nous définissons un nouvel outil de segmentation statistique non supervisée, basé sur un modèle d'arbre de Markov caché flou. Notre modèle flou combine l'incertitude probabiliste des données observées avec les classes thématiques discrètes et continues qui représentent l'imprécision des données cachées. La technique de segmentation bayésienne mise en oeuvre correspond au critère MPM (Mode of Posterior Marginals). Notre approche permet d'une part le traitement d'objets contenant des structures diffuses comme c'est le cas en imagerie astronomique et d'autre part la prise en compte de données multi-bandes observées à différents niveaux de résolution et issues de capteurs corrélés. Nous validons notre modèle sur des images de synthèse et des images réelles multispectrales

I-Revues

Making Sense of Variations: Introducing Alternatives in Speech Synthesis

Author: Lanchantin Pierre
Obin Nicolas
Veaux Christophe
Publication venue
Publication date: 01/05/2012
Field of study

International audienceThis paper addresses the use of speech alternatives to enrich speech synthesis systems. Speech alternatives denote the variety of strategies that a speaker can use to pronounce a sentence - depending on pragmatic constraints, speaking style, and specific strategies of the speaker. During the training, symbolic and acoustic characteristics of a unit-selection speech synthesis system are statistically modelled with context-dependent parametric models (GMMs/HMMs). During the synthesis, symbolic and acoustic alternatives are exploited using a GENERALIZED VITERBI ALGORITHM (GVA) to determine the sequence of speech units used for the synthesis. Objective and subjective evaluations supports evidence that the use of speech alternatives significantly improves speech synthesis over conventional speech synthesis systems

Edinburgh Research Explorer

Automatic Phoneme Segmentation with Relaxed Textual Constraints

Author: Lanchantin Pierre
Morris Andrew C.
Rodet Xavier
Veaux Christophe
Publication venue
Publication date: 01/01/2008
Field of study

cote interne IRCAM: Lanchantin08aNational audienceVery high quality text-to-speech synthesis can be achieved by unit selection in a large recorded speech corpus [1]. This technique uses some optimal choice of speech units (e.g. phones) in the corpus and concatenates them to produce speech output. For various reasons, synthesis sometimes has to be done from existing recordings (rushes) and possibly without a text transcription. But, when possible, the text of the corpus and the speaker are carefully chosen for best phonetic and contextual covering, for good voice quality and pronunciation, and the speaker is recorded in excellent conditions. Good phonetic coverage requires at least 5 hours of speech. Accurate segmentation of the phonetic units in such a large recording is a crucial step for speech synthesis quality. While this can be automated to some extent, it will generally require costly manual correction. This paper presents the development of such an HMM-based phoneme segmentation system designed for corpus construction

Edinburgh Research Explorer

Reconstructing Voices within the Multiple-Average-Voice-Model framework

Author: Gales Mark J F
King Simon
Lanchantin Pierre
Veaux Christophe
Yamagishi Junichi
Publication venue
Publication date: 01/09/2015
Field of study

Edinburgh Research Explorer

Multiple-average-voice-based speech synthesis

Author: Gales Mark J F
King Simon
Lanchantin Pierre
Yamagishi Junichi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/05/2014
Field of study

Crossref

Edinburgh Research Explorer

Dynamic Model Selection for Spectral Voice Conversion

Author: Lanchantin Pierre
Rodet Xavier
Publication venue: HAL CCSD
Publication date: 01/09/2010
Field of study

cote interne IRCAM: Lanchantin10bNone / NoneNational audienceDynamic Model Selection for Spectral Voice Conversio

OBJECTIVE EVALUATION OF THE DYNAMIC MODEL SELECTION METHOD FOR SPECTRAL VOICE CONVERSION

Author: Pierre Lanchantin
Xavier Rodet
Publication venue
Publication date: 24/04/2020
Field of study

ABSTRACT Spectral voice conversion is usually performed using a single model selected in order to represent a tradeoff between goodness of fit and complexity. Recently, we proposed a new method for spectral voice conversion, called Dynamic Model Selection (DMS), in which we assumed that the model topology may change over time, depending on the source acoustic features. In this method a set of models with increasing complexity is considered during the conversion of a source speech signal into a target speech signal. During the conversion, the best model is dynamically selected among the models in the set, according to the acoustical features of each source frame. In this paper, we present an objective evaluation demonstrating that this new method improves the conversion by reducing the transformation error compared to methods based on a single model

CiteSeerX

Objective Evaluation of the Dynamic Model Selection for Spectral Voice Conversion

Author: Lanchantin Pierre
Rodet Xavier
Publication venue: HAL CCSD
Publication date: 01/05/2011
Field of study

cote interne IRCAM: Lanchantin11bNone / NoneNational audienceObjective Evaluation of the Dynamic Model Selection for Spectral Voice Conversio